June 26, 2025
Thursday
All t-tests assume approximate normality of the data.
In the case of one-sample t-tests, the measure of interest must somewhat follow a normal distribution.
In the case of two-sample t-tests, the measure of interest in each group must somewhat follow a normal distribution.
Note that a paired t-test is technically a one-sample t-test, so we will examine normality of the difference.
There are formal tests for normality (see article here), however, we will not use them.
Instead, we will assess normality using a quantile-quantile (Q-Q) plot.
A Q-Q plot helps us visually check if our data follows a specific distribution (here, the normal).
How do we read Q-Q plots?
wing_flap %>% independent_mean_HT(grouping = target,
continuous = apples,
mu = 5,
alternative = "greater",
alpha = 0.05)Two-sample t-test for two independent means and equal variance:
Null: H₀: μ₁ − μ₂ = 5
Alternative: H₁: μ₁ − μ₂ > 5
Test statistic: t(23) = 5.445
p-value: p < 0.001
Conclusion: Reject the null hypothesis (p = < 0.001 < α = 0.05)
independent_qq() function from library(ssstats) to assess normality.Let’s now look at the normality assumption for our example.
How should we change the code for our dataset?
Let’s now look at the normality assumption for our example.
How should we change the code for our dataset?
The t-tests we have already learned are considered parametric methods.
Nonparametric methods do not have distributional assumptions.
Why don’t we always use nonparametric methods?
They are often less efficient: a larger sample size is required to achieve the same probability of a Type I error.
They discard useful information :(
In the nonparametric tests we will be learning, the data will be ranked.
Let us first consider a simple example, x: \ 1, 7, 10, 2, 6, 8
Our first step is to reorder the data: x: \ 1, 2, 6, 7, 8, 10
Then, we replace with the ranks: R: \ 1, 2, 3, 4, 5, 6
What if all data values are not unique? We will assign the average rank for that group.
For example, x: \ 9, 8, 8, 0, 3, 4, 4, 8
Let’s reorder:x: \ 0, 3, 4, 4, 8, 8, 8, 9
Rank ignoring ties:R: \ 1, 2, 3, 4, 5, 6, 7, 8
Now, the final rank:R: \ 1, 2, 3.5, 3.5, 6, 6, 6, 8
Hypotheses
Test Statistic & p-Value
Rejection Region
Conclusion/Interpretation
[Reject or fail to reject] H_0.
There [is or is not] sufficient evidence to suggest [alternative hypothesis in words].
Before ranking, we will find the difference between the paired observations and eliminate any 0 differences.
Note 1: elimniating 0 differences is the big difference between the other tests!
Note 2: because we are eliminating 0 differences, this means that our sample size will update to the number of pairs with a non-0 difference.
When ranking, we the differences are ranked based on the absolute value of the difference.
We also keep the sign of the difference.
| X | Y | D | |D| | Rank |
|---|---|---|---|---|
| 5 | 8 | -3 | 3 | - 1.5 |
| 8 | 5 | 3 | 3 | + 1.5 |
| 4 | 4 | 0 | 0 | ——— |
Hypotheses
Test Statistic & p-Value
Rejection Region
Conclusion/Interpretation
[Reject or fail to reject] H_0.
There [is or is not] sufficient evidence to suggest [alternative hypothesis in words].
STA4173 - Biostatistics - Summer 2025